Automatic Extraction and Generation of XML Documents from Financial Reports
نویسندگان
چکیده
Web services require XML formatted data. Human translation of business information from the rapidly expanding volume of documents to XML is labor-intensive and impractical. Computer programs can be built to extract domain-specific facts from web documents and convert them into an XML format. With a continual feed of web articles, such a system could be used to maintain an up-to-date XML knowledge base that could power web services for businesses. In this research, we build a system to automatically extract information from electronic international corporate financial reports, and translate this information into XML or XBRL (a well-known XML extension for accounting and financial data).
منابع مشابه
Management of XML Documents in an Integrated Digital Library
We describe a generalized toolset developed by the Perseus Project to manage XML documents in the context of a large, heterogeneous digital library. The system manages multiple DTDs through mappings from elements in the DTD to abstract document structures. The abstraction of document metadata, both structural and descriptive, facilitates the development of application-level tools for knowledge ...
متن کاملخوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملAutomatic Workflow Generation and Modification by Enterprise Ontologies and Documents
This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...
متن کاملInformation Extraction and Automatic Markup for XML Documents
As XML is going to become the standard document format, there is still the legacy problem of large amounts of text (written in the past as well as today) that are not available in this format. In order to exploit the benefits of XML, these legacy texts must be converted into XML. In this chapter, we discuss the issues of automatic XML markup of documents. We give a survey on existing approaches...
متن کاملLooking at the Web through XML Glasses
The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and make information accessible to applications, in order to offer automation, inter-operation and Web-awareness among services. To do so, information from Web sources needs to be accessible in a structured way. XML and it...
متن کامل